A Simple Sampling Method for Estimating the Accuracy of Large Scale Record Linkage Projects.
نویسندگان
چکیده
BACKGROUND Record linkage techniques allow different data collections to be brought together to provide a wider picture of the health status of individuals. Ensuring high linkage quality is important to guarantee the quality and integrity of research. Current methods for measuring linkage quality typically focus on precision (the proportion of incorrect links), given the difficulty of measuring the proportion of false negatives. OBJECTIVES The aim of this work is to introduce and evaluate a sampling based method to estimate both precision and recall following record linkage. METHODS In the sampling based method, record-pairs from each threshold (including those below the identified cut-off for acceptance) are sampled and clerically reviewed. These results are then applied to the entire set of record-pairs, providing estimates of false positives and false negatives. This method was evaluated on a synthetically generated dataset, where the true match status (which records belonged to the same person) was known. RESULTS The sampled estimates of linkage quality were relatively close to actual linkage quality metrics calculated for the whole synthetic dataset. The precision and recall measures for seven reviewers were very consistent with little variation in the clerical assessment results (overall agreement using the Fleiss Kappa statistics was 0.601). CONCLUSIONS This method presents as a possible means of accurately estimating matching quality and refining linkages in population level linkage studies. The sampling approach is especially important for large project linkages where the number of record pairs produced may be very large often running into millions.
منابع مشابه
A Hybrid Intelligent Model to Increase the Accuracy of COCOMO
Nowadays, effort estimation in software projects is turned to one of the key concerns for project managers. In fact, accurately estimating of essential effort to produce and improve a software product is effective in software projects success or fail, which is considered as a vital factor. Lack of access to satisfying accuracy and little flexibility in existing estimation models have attracted ...
متن کاملIdentification of Pattern used in Determination of Critical Success Factors in ITS Projects, Case Study: Road Maintenance and Transportation Organization
One of the risks recognized by relevant authorities is the risk of outsourcing ITS projects. The purpose of this study was to design and explain the pattern of determining the critical success factors in outsourcing large-scale ITS projects in the Ministry of Roads and Urban Development (Road Maintenance and Transportation Organization). This study was performed using qualitative method. The pa...
متن کاملAn Improved Algorithmic Method for Software Development Effort Estimation
Accurate estimating is one of the most important activities in the field of software project management. Different aspects of software projects must be estimated among which time and effort are of significant importance to efficient project planning. Due to complexity of software projects and lack of information at the early stages of project, reliable effort estimation is a challenging issue. ...
متن کاملAn Improved Algorithmic Method for Software Development Effort Estimation
Accurate estimating is one of the most important activities in the field of software project management. Different aspects of software projects must be estimated among which time and effort are of significant importance to efficient project planning. Due to complexity of software projects and lack of information at the early stages of project, reliable effort estimation is a challenging issue. ...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Methods of information in medicine
دوره 55 3 شماره
صفحات -
تاریخ انتشار 2016